Tackling Multilabel Imbalance through Label Decoupling and Data Resampling Hybridization
نویسندگان
چکیده
The learning from imbalanced data is a deeply studied problem in standard classification and, in recent times, also in multilabel classification. A handful of multilabel resampling methods have been proposed in late years, aiming to balance the labels distribution. However these methods have to face a new obstacle, specific for multilabel data, as is the joint appearance of minority and majority labels in the same data patterns. We proposed recently a new algorithm designed to decouple imbalanced labels concurring in the same instance, called REMEDIAL (REsampling MultilabEl datasets by Decoupling highly ImbAlanced Labels). The goal of this work is to propose a procedure to hybridize this method with some of the best resampling algorithms available in the literature, including random oversampling, heuristic undersampling and synthetic sample generation techniques. These hybrid methods are then empirically analyzed, determining how their behavior is influenced by the label decoupling process. As a result, a noteworthy set of guidelines on the combined use of these techniques can be drawn from the conducted experimentation.
منابع مشابه
Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets
Multilabel classification is an emergent data mining task with a broad range of real world applications. Learning from imbalanced multilabel data is being deeply studied latterly, and several resampling methods have been proposed in the literature. The unequal label distribution in most multilabel datasets, with disparate imbalance levels, could be a handicap while learning new classifiers. In ...
متن کاملAddressing imbalance in multilabel classification: Measures and random resampling algorithms
The purpose of this paper is to analyze the imbalanced learning task in the multilabel scenario, aiming to accomplish two different goals. The first one is to present specialized measures directed to assess the imbalance level in multilabel datasets (MLDs). Using these measures we will be able to conclude which MLDs are imbalanced, and therefore would need an appropriate treatment. The second o...
متن کاملMLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation
Learning from imbalanced data is a problem which arises in many real-world scenarios, so does the need to build classifiers able to predict more than one class label simultaneously (multilabel classification). Dealing with imbalance by means of resampling methods is an approach that has been deeply studied lately, primarily in the context of traditional (non-multilabel) classification. In this ...
متن کاملAn Ensemble Multilabel Classification for Disease Risk Prediction
It is important to identify and prevent disease risk as early as possible through regular physical examinations. We formulate the disease risk prediction into a multilabel classification problem. A novel Ensemble Label Power-set Pruned datasets Joint Decomposition (ELPPJD) method is proposed in this work. First, we transform the multilabel classification into a multiclass classification. Then, ...
متن کاملTowards Label Imbalance in Multi-label Classification with Many Labels
In multi-label classification, an instance may be associated with a set of labels simultaneously. Recently, the research on multi-label classification has largely shifted its focus to the other end of the spectrum where the number of labels is assumed to be extremely large. The existing works focus on how to design scalable algorithms that offer fast training procedures and have a small memory ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.05031 شماره
صفحات -
تاریخ انتشار 2018